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1. Introduction 


Multicarrier signaling schemes such as orthogonal frequency division multi¬ 
plexing (OFDM) are highly susceptible to nonlinear distortion at all stages of 
the transmission process. This is partly due to the impulsive nature of these 
signals in the time domain, where the superposition of modulated waveforms 
takes place. When nonlinear distortion is confined to the transmitter, many 
proposed methods in the literature use reserved carriers (or tones) to carry in¬ 
formation about this distortion to the receiver, at the obvious cost of reducing 
data-rate EHS]. The central idea in these techniques is to construct clipping (or 
peak-reducing) signals by performing a constrained search at the transmitter, 
one which confines the frequency support of the clipping signal to the reserved 
carriers, while reducing the peaks of the data signal in the time domain. These 
approaches are not generally robust, as they demand that the frequency sup¬ 
port of the data and clipping signals remain strictly disjoint throughout the 
transmission process, and add significant complexity at the transmitter. 

To combat this, techniques based on compressive sensing (CS) that were 
tuned to clipped OFDM models were proposed in [B]. These techniques re¬ 
moved the need for any constrained search at the transmitter, since the receiver 
could detect the entire clipping signal by observing a subset of its frequency 
components available on the reserved tones, provided that the signal is sparse 
in time. Consequently, the need to maintain orthogonality in frequency was 
completely relaxed, but the need for a significant amount of reserved carriers 
persisted. To avoid this loss in data-rate, the authors in [7] proposed using the 
channel estimation pilots for this purpose. Nonetheless, this approach severely 
limits the number of measurements available to the CS algorithm and hence 
its ability to deal with severe clipping scenarios. It also does not make use of 
available information such as clipping likelihood and phase resemblance in the 
time domain (i.e., phase resemblance between the clipped and clipping signal) 
as done in [B]. 

In this paper, a fundamentally different approach to these methods is pur¬ 
sued. Specifically, in contrast to the authors’ previous work on the topic of 
using CS concepts in OFDM [6], the technique presented in this paper does not 
require any orthogonality between the frequency support of the data and the 
distortion, and no tones (null, edge, channel pilots, or otherwise) are needed ei¬ 
ther. The receiver is free to select which and how many data tones it will use to 
read off differential observations, and will use them to estimate and cancel the 
entire distortion over the tones. In addition, no data-rate is lost by employing 
the proposed strategy. Furthermore, a signihcant tradeoff also exists in regard 
to complexity, distortion tolerance, and robustness to channel estimation errors, 
so that the user has many algorithms to use within the proposed framework. 
Similarly, in contrast to another recent work ^ , the current paper introduces an 
entirely new and rigorous way of analyzing the reliability of tones. In addition, 
it also introduces a method to finetune the performance of CS by minimizing 
the probability of incorrect measurements and maximizing the clipping-to-noise 
ratio (CNR). 
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This framework is made possible by jointly taking three phenomena into 
account. The first is that not all tones carry correct decoding information to 
the receiver. The second is that the receiver can probabilistically assign levels 
of confidence to each tone, and the third is that the distortion is sparse in the 
time domain. These phenomena motivate us to employ CS and sparse recovery 
techniques much more effectively compared to previous techniques, as we can 
quantify the perturbations on data tones, select the most reliable ones, and then 
most-importantly, use the power of CS techniques to recover the significant time 
domain distortions. Our major contributions include: 

1. Formulating CS models within a pilotless transmission framework. 

2. Proposing systematic methods to adaptively select the tone subset to sense 
over, among the combinatorially large possibilities of pilotless CS models. 

3. Developing a novel method for accurately assessing the reliability of esti¬ 
mated coefficients based on their symbol-wise magnitudes, phases, and rel¬ 
ative locations to other constellation points, as well as channel strengths. 

4. Deriving a closed-form expression that characterizes the modes of behavior 
of the reliability function, and devising geometrically-inspired approxima¬ 
tions based on this expression for quick and efficient selection of the tone 
subset over each OFDM block. 

5. Proposing dual-stage construction of the tone subset, where the first stage 
minimizes the probability of incorrect measurements, while the second 
maximizes an CNR metric to optimize CS performance. 

6. Providing probabilistic upper bounds for choosing the number of tones in 
the CS model without risking incorrect measurements. 

The remainder of the paper is organized as follows. Section briefly describes 
the transmission and distortion models. Section demonstrates how a pilotless 
CS model can be derived within the previous transmission model. Subsequently, 
Section]^ the heart of the paper, focuses on selecting the subset of tones used 
for CS. This includes developing reliability assessment criteria, deriving ana¬ 
lytical approximations for quick and efficient assessment, selecting the number 
of tones, establishing dual-stage subset selection to maximize CS performance, 
and, finally, condensing the major results into an algorithm. Section presents 
our simulations and Section concludes the paper. 

1.1. Notation 

We use regular font for scalars and boldface letters for matrices and vectors. 
To distinguish between vectors in the time and frequency domains, we use bold¬ 
face calligraphic notation for vectors in the frequency domain (e.g. AT, AT, y) 
and boldface lowercase letters for their corresponding time domain representa¬ 
tions (e.g. x,x,y). We use X{k) to denote the fcth coefficient of AT, or more 
simply X, when it is clear from the context. Moreover, we use ATn to represent 
a vector formed by selecting the coefficients of AT indexed by set fl. Similarly, 
is the vector formed by indexing the corresponding elements of vector y^ 
according to the index set D. We further define So to be a diagonal binary 
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selection matrix, with |n| number of I’s at locations along its diagonal specified 
by the tone set il. 

2. Transmission and Clipping Model 

In an OFDM system, serially incoming bits are mapped into an M-ary QAM 
alphabet A.= {Ao,Ai,... ,Am-i\ and concatenated to form an A^-dimensional 
data vector, X = [A’(0)A’(1), • • • ,X{N — 1)]^ € . The time domain signal 

X is obtained by an IFFT operation such that x = F'^X, where 

Fk{l) = ^ A:, / G 0,1,..., Af - 1. 


Since x has a high PAPR, the digital samples are subject to a magnitude limiter 
that saturates its operands to a value of 7 . Hence, instead of x^ we feed ® to 
the power ampliher, where 


if|a;(i)|> 7 , 

“ [x{z) otherwise, ^ ^ 

and where 0x{i) is the phase of x{i). This hard-limiting operation can be conve¬ 
niently thought of as adding a peak-reducing signal c to a? so that its low-PAPR 
counterpart ® = x -|- c is transmitted instead. Furthermore, by setting a typical 
clipping threshold, 7 , on x, c is controllably sparse in time by the impulsive na¬ 
ture of X, and dense in frequency by the uncertainty principle. We denote the 
temporal support of c by Xc = {t : c(z) 7 ^ 0 } and always maintain the practical 
assumption that \Ic\ ^ N. 

Subsequently, x is convolved with a channel of impulse response h ~ CA/’( 0 , 
and subjected to additive white Gaussian noise (AWGN) 2 ; ~ CA/’(0, 
where Lh is the length of channel impulse response. Equivalently, in the fre¬ 
quency domain, this translates to transmitting 

X = x + C, ( 2 ) 

with complex coefficients that are now randomly pre-perturbed from the lattice 
, followed by additional multiplicative perturbations by the channel H and 
additive perturbations by the noise Z ~ CA/’(0, at the receiver. By 

virtue of the added cyclic prehx (of length > Lh), the circulant channel matrix 
H can be decomposed and expressed as H = F'^AF where A is an A^ x A^ 
diagonal matrix composed of the frequency-domain channel gains, {A(fc)}^]^. 
As a result the frequency domain received signal reads y = A.X F Z. where, for 
the moment, we make the practical assumption that the channel coefficients are 
known at the receiver. Consequently, ^ can be directly recovered scalar-wise 
from y, i.e.. 


x{k) = x-^{k)y{k) = x{k) + c{k) + x-\k)z{k), ( 3 ) 
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where we use the notation X to represent the equalized estimate of X at the 
receiver. Writing ([^ in vector notation yields 

X = X + C + X-^Z. (4) 

Let Vik) = C{k) + \~^{k)Z{k) denote the general distortion on X(k), We let 
/d to be the pdf of the general distortion T) which we assume to be zero mean 
circularly symmetric Gaussian with variance (Please refer to Appendix A 
for details regarding the derivation of Equation Q could now be written 
as 


k = x + v. (5) 

Treating the clipping distortion as additive noise, an maximum likelihood (ML) 
decoder will recover X{k) by simply mapping X{k) to the nearest constellation 
poin fl (A(fc)), where {X_{k)) — argmin^^ — AnI- In other words, 

the operation {X(k)) corresponds to rounding X(k) to the nearest constellation 
point. Note that {X(k)) does not need to be the true constellation point, i.e., 
{X{k)) might be different from X(k). Such a hard-decoding scheme is very 
efficient in the classical AWGN scenario for high signal-to-noise ratio (SNR). 
However, in our case, in addition to the additive noise X~^{k)Z{k), we have a 7 - 
dependent source of perturbation C{k) which is independent of the SNR. GS and 
similar sparse recovery algorithms seem to be a very sensible solution towards 
recovery of C{k). Since c is sparse in the time domain, a partial observation 
of c in frequency domain is sufficient to estimate c and hence C in one shot. 
This would certainly get around the problem of unreliable perturbations as 
CS algorithms, for instance, can be totally blind to them and still offer near 
optimal signal reconstruction under mild conditions [5]. The main issue is to 
decide which partial observation to use. This will be the topic of the following 
section. 


3. Development of Compressive Sensing Models with No Tone Reser¬ 
vation 

With the addition of the general distortion vector T> to the data vector X, 
we expect that part of the data samples will be severely perturbed such that 
they fall out of their true decision regions. Let {X{k)) denote the decoded data 
sample corresponding to X{k), then the true decision region for X{k) is defined 
as Q{k) = {X{k) + U G C : {X{k) + U) = X{k)} where U is a. factor which 
when added to X{k) keeps it in its true decision region. Moreover, denote 
by fix = {k G ft : {X{k) + D{k)) = X{k)} the subset of data tones in H = 


^While A-m refers to the mth constellation point (1 < m < M), we reserve {X{k)) to 
denote the nearest constellation point corresponding to the kth received data sample X{k). 
Furthermore, note that the true constellation point corresponding to X{k) is A!{k). 
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{1,2, in which the perturbations do not cause data samples to cross 

their true decision regions. Let fix = be its complement. Over the data 

tones of O-r, the equality in {X{k)) = X{k) is true and hence from ([^ and ([^, 
— ~ complement set 

Ot at which - Xq^ ^ generally, we can 

write 

iD = Sa. (^-(^))+Ss^(^-Ar), (6) 

where Sn,j, is an iVxiV diagonal binary selection matrix, with lOj’l number of 
I’s at locations along its diagonal specified by the tone set fix- It extracts the 
elements of the vector ^ — (■^) according to the tone set Q.x while nulling the 
others. The matrix is similarly defined with I’s along the diagonal specified 
by the set Ot- It is easy to see that = 0 . Practically speaking, Oy 

constitutes the larger part of the tone set O. An essential part of OFDM signal 
recovery obviously constitutes finding fix and correcting the distortions over 
Clx to finally reach the state fix = O. 

From ([^ we have at the receiver, 

X = X + 'D, ( 7 ) 


which is the analog estimate ^ of the data vector X affected by the distortion 
X>. We define S = X — {^) to be a vector that is nonzero at locations where 
the decoded estimate at the receiver {^) differs from the data vector X. From 
the discussion above, we see that 


X{k) - {X{k)) 


V{k), if {Xik)) = X{k) 

V{k)+S{k), li {X{k)) ^ X{k) 


which allows us to write 




( 8 ) 

( 9 ) 


Note that we do not require all of VLx to recover c. Rather, we only require 
an arbitrary subset C Qx C of cardinality m = |Dm| < IDt’I to correctly 
recover c by CS (m and |f2m| will be used interchangeably as appropriate to 
denote the number of measurements). As a result, we can replace the equation 
above with Sn„(^— (^)) = = Sn^Fc+ Sq^A'^S, where Sn^ is also 

a NxN diagonal binary selection matrix defined in a similar way as Sq^, in the 
above. We write the above equation simply as y' = ^c+Z', where '5' = Sn^F, 
Z' = and y' = (^— (X)) which denotes the observation vector 

of the differences over the tones in nulled at the discarded measurements. 
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This leads us to a pilotless CS modeQ 


( 10 ) 


where is the |f 2 ,; 
in y'. 


I-dimensional vector composed of the nonzero coefficients 


By inspecting (10), we notice that c is an iV-dimensional sparse vector in the 


time domain, corresponding to the difference between the time representations 
of the OFDM signal x and its clipped counterpart x. The matrix £m-KN jg 

obtained by m random row extractions from the NxN Fourier matrix according 
to flm (the cause of randomness is discussed later). The m-dimensional vector 
is the corresponding partial frequency-domain observation that we use to 
estimate c, contaminated by the Gaussian noise vector, . 

This is a standard model in CS Note however that the parametrization 

by Dm actually defines a huge set of 2^ possible model^ In the forthcoming 
sections we will discuss in detail how to determine a proper model from all 
these possibilities. For the time being, we assume that an appropriate Dm is 
chosen, and c could therefore be recovered using any CS technique, be it con¬ 
vex programming, greedy pursuit, or iterative thresholding, and a very flexible 
region for tradeoff exists in regard to the performance and complexity of these 
techniques. 

In this paper, we use two different schemes of CS to recover c from the 


developed CS model in (10), one from the convex relaxation group and the 


other from greedy pursuit methods. More specifically, the first is an adaptation 
of the least absolute shrinkage and selection operator (LASSO) [TU] to this 
problem called the weighted and phase-augmented LASSO (WPAL) [ 6 ]. It 
incorporates phase information and clipping likelihood available from the data 
in the time-domain to improve distortion recovery performed in the frequency- 
domain. Specifically, we know that c is composed of just the clipped portions 
of the transmitted signal x, so at clipping locations Zc(fc) = —Zx{k). We also 
know that the closer |a;(fc)| to the value of the clipping threshold 7 , the higher 
the likelihood that c had an active coefficient at k. This additional information 
is incorporated in the CS algorithm in the form of weighting to improve its 
performance. Therefore, we define w = | |i| — 7 ]^ to be such a weighting vector to 
the £i-norm of c in the LASSO where x refers to the estimated received clipped 
signal. We further define the diagonal phase matrix 0c = — exp (diag(j0£)) such 


that 


„WPAL 


= 0c 


„WPAL I 


With these two variables defined, the optimization 


^The reason we stress that the CS model does not reduce transmission rate is that there 
have been previous alternate attempts by the authors [6] and others [7] to use compressive 
sensing in a tone-reservation setting which required significant reduction in data-rate. 

^The reason is that we do not know fim or even |f!m| and s o w e might have to search over 
all the subsets of N tones giving us a total of 2^ models like l|l0[l to choose from. 
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problem we solve becomes 


I ^WPAL 


= argmin u;^|c| 
|c|gR« 


s-t- - ^n„®c|c| II 2 < e 


( 11 ) 


for some noise-dependent parameter e. The other technique is the fast Bayesian 
matching pursuit (FBMP) by Schniter et al. |11| chosen for its superior perfor¬ 
mance and efficiency when a relatively large number of measurements is avail¬ 
able, which is indeed the case compared to tone reservation cases proposed in 

[S] and [7]. 

Finally, once - the CS estimate of c - has been obtained through any of 
the abovementioned schemes, we use (H and @ to obtain 

X = = X+ [C-C’^^]+Ar^z 

= X+ + A-^Z ^ X /A, (12) 


where 5^® ^ C - and A 


CS 


7 CS 


A Z. X could be used to determine 


the data vector X exactly, provided that no A(fc) causes crossing of X{k) out of 
its ML decoding region (this issue will come up in Section 4.31. Our subsequent 
objective is to scrutinize the general conditioning of the model itself by supplying 
our most reliable observations to the generic CS algorithm. 


4. Cherry Picking 

An essential question now is how to select among the 2^ possible £lm (or 
if m is fixed) in order to compute c^®. In this connection we devise a 
reliability function which associates a reliability estimate with each tone and 
thus lets us determine the m most reliable tones to construct ilm- A general 
strategy of CS techniques is to select these m tones randomly for near-optimal 
performance [8] . Although possible in our scenario, such a strategy neglects the 
fact that our observations vary in their credibility and attest to whether or not 
they represent true frequency-domain measurements of cQ 

Since we deal with each tone separately in what follows, we henceforth drop 
the k index while preserving the italic notation to emphasize the scalar-wise 
operations in this section. With the receiver risking faulty decisions, it must 
devise a procedure to select the most reliable set of observations over which 
to sense. To this end consider the estimate A and the nearest constellation 
point (A). The latter is in general surrounded by eight points which either 
belong to the set of nearest neighbors (NN) or the set of next nearest neighbors 
(NNN) as illustrated in Fig. Now the selection of the most reliable set of 


^The measurements at tones Qrn in im are used to determine c and hence C. However, 
these measurements truly represent C only if = X(k) for k £ Qm- We can not ascertain 

that it is true but we can calculate its probability. 
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NN 
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• 
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NN 
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Figure 1: An illustration of nearest neighbors (NN) and next nearest neighbors 
(NNN) of X 


observations could be done based on the relative posterior probability that X> 
equals A — (A) to the probability that it equals some other difference vector 
A —Am I Am ^ (A). For example, let j - The higher the value of 

A, the higher the reliability that X = {X) as relative to the fact that X should 

decode to Aatat. From (7), we can see that \ Now as mentioned 

1 I AInn) 

earlier, we model D to be Gaussian circularly symmetric with variance a^, then 

= ^eXp(-;A|^|2) 


and we can write, 


m = exp( ^(|A-(A)|^-|A-Ann|- 


Intuitively, the minimum certainty occurs at the boundary of the decision region 
and attains fHmin = 1. At such tones, we would be highly skeptical of whether 
I? = A — (A) or I? = A — Ann, and we would hence supply a plausibly false 
measurement to the CS algorithm. To avoid such unreliable measurements, 
assume we only choose the tones with respective perturbations A — (A) that 
are confined in the complex plane to a disk of radius Tq (i.e., |A— (A)| < To). In 
such a case, given the minimum distance between any two constellation points 
(dmin), the minimum reliability would increase to Amin = 
the complex scalar A — (A) pointed in the direction of the nearest neighbor 
Ann, and to A = / - y for the next nearest neighbor Annn when it 

/-D V 2ainjn—To j 

points in the direction of a decision region’s corner. So while both Ai and X 2 
have the same distance Tq from (Ai) = (A 2 ), Ai has a higher reliability than A 2 
as it is farther from the nearest neighbor. Fig.[^shows a part of constellation to 
illustrate the idea. This suggests a need to factor in the direction or phase of the 
perturbation, assessing its reliability in addition to the magnitude- 
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dependent pdf /p. Defined axiomatically, the reliability of a measurement at 
each tone is then a function that maps a 3-tuple {\K — 
into M[i,oo] (i-e., a function of the magnitude of the observation, its phase and 
the channel gain at that tone). 



Figure 2: Reliability is not only a function of the magnitude but also the phase 
of the observation. 

Ultimately, we would choose our measurements according to the tones asso¬ 
ciated with the highest m reliability outputs, i.e., 

Dm = arg{93i:Ar}^^jY_^^j^ (13) 

to sense over, where d\i-,N denotes the ith-order statistic in the vector [12]. 
With this selection of Dm, the locations of the measurement tones correspond 
to the indices of the highest m-order statistics of N random variables in fH. 
As mentioned previously, each of these variables is a function of the 3-tuple 
above, and whereas the first two are uncorrelated across the tones, this does not 
generally hold for the third, i.e., A(fc). 

In fact, assuming channel taps with a uniform power-delay profile, then 
the absolute autocorrelation of the channel across tones k and I can be expressed 
as |E[A(fc)A'^(/)]| = sine {irLhik — l)/N) [T3]. Hence, only for sufficiently large 
Lh can we assume that the channel gains are uncorrelated. Otherwise, the set 
of reliable tones Dm deviates from a uniformly random tone selection model 
typically assumed in the literature mis], and reliable tones would instead come 
in clusters corresponding to strong channel gains. The efficiency of CS in this 
case might be reducecQ 


^Nonetheless some methods such as FBMP are not much hindered by this fact El¬ 


io 













4.I. Criteria for Evaluating 5H 

Using the reasoning based on the scalar-wise likelihood ratio defined in (13), 
an exact expression for the reliability could be a direct generalization of (13), 
namely, 




exact 


MX-{X)) 


E m —1 

m^0,Arn.^{K) 


fv{K — An) 


(14) 


Unfortunately, this pursuit for exact reliability computation is inefficient since 
it requires NM evaluations of frii'), which grows with the constellation size M, 
even though the probability of a perturbation exceeding the first tier of eight 
surrounding neighbors (i.e., the nearest and next nearest neighbors illustrated in 
Fig. 0 is insignificant. As such, we can truncate the computation above to the 
first tier, denoting its result by with a minor effect on the performance. 

Two simpler reliability functions are also worth mentioning. The first is 
solely based on the probability of the perturbation (i.e., /-pdA — (<U)|)) and is 
hence completely blind to the direction of the perturbation in the constellation’s 
plane, while the other one intuitively takes this extra phase information into 
account by defining a square centered at (A) as the reliable region, hence having 
the ability to favor perturbations with larger magnitudes if Qx-{x) close 
to 7r/4, i.e., if they pointed to the next nearest neighbor. We denote these 
reliability functions by 94° and 94°, respectively, motivated by the geometric 
shape they define. In the next section, a more rigorous approach is taken to 
justify when such simpler methods can be used. 


4-2. Analysis of Truncated and Approximate Reliability Criteria 

Dropping the tone index, assume that A is an observation that falls among 
four points in an M-QAM constellation such that S Q. Let A — (A) = re^^ 
be the polar representation of this point with the origin at (A), such that 
r = |A — (A)I G [0, and 9 = G [0,7r/2]. We are interested in a 

more abstract expression of the truncated Bayesian reliability function, 
one that defines its output by only acting on A — (A) while taking the relative 
position in the constellation implicitly into account as welj^ 

By the definition of r, and by referring to Fig. which again shows a part of 
the constellation diagram (similar to Fig. 0, the distances between A and the 
other three competing constellation points are 

(r, 9) = sjr'^ - 2rd^in cos 9 + dC^, (15) 

r2{r,9) = ^Jr"^ - 2rd^in{cos9 + sin0) -b 2dC^, (16) 


®This could be similarly carried out to the non-truncated function albeit with an 

unnecessary inflation of expressions with hardly any additional insight. 
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and 


r 3 (r, 9) = - 2rdmin sin 9 + 


( 17 ) 


We neglect detailing their phases, 0i,02, and 03 , since they have no effect on 




Figure 4: 0) in (181 normalized and plotted on the first quadrant for 

cr|, = 0.2dniin and evaluated at r = O.ldmin, 0.2dmin, ... ,0.7dmin- Note that 0 
varies from 0° to 90°. 


the results, although ri,r 2 , and are clearly functions of r and 0 as portrayed 
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in Fig. In effect, reduces to 


i=l 

(r^ —2rcZniin cos ^ + 

+ 

(r2-2r-d„i„(cos fl+sin S)+2<P^.\ ^ (r^-2rd„i„ sin e+d^i„)' 

e^-D +e“'® 



Canceling out the common function f-oif) appearing in all terms above, and 
collecting common terms yields 




'^min 1 

\r,0) = (e [e"® 


^ (2(imini’cos e) ^(2d„i„r(cos6»+sine)-d^,^) ^ (2d„i„r sin S) 


+ e'^-D 


+ e^T) 


-1 


= (/3 [a“"®+a"“®+/3a' 


COS @+sin 6 


]) 


-1 


(18) 


for the first quadrant, 9 G [0,7r/2], in the complex plane, where a = exp(2dminr/cr^) 
and /3 = exp(—Although clearly a function of r, we will treat a as 
a constant (i.e., evaluated at a fixed magnitude, r) when we wish to focus on 
^trunc explicit function of 0, say This function is symmetric about 

0 = 7r/4 and exhibits quite complex behavior with r and 0 as indicated in Fig. 

01 

Most importantly, our concern will be whether jg (.Qgygx or con¬ 
cave with respect to 0 at different regions of r. This is because un¬ 
dertakes a fundamental shift in behavior as r varies from 0 to and its 

approximation by basic trigonometric functions and geometric objects such as 
squares and circles depends on whether it is convex or concave with respect to 9. 
Notice first that when r <C a\,lId-cam, a ~ \ and hence 91*"'™'= « i 3 [ 2 ^+i 3 ] becomes 
relatively isotropic (i.e., independent of 0) and therefore akin to 91°. Referring 
to Fig. for example, the polar curve of the normalized reliability function 
evaluated at the smallest magnitude r = O.ldmin, confirms 
this observation. (Refer to the curve with blue circular markers in Fig.j^. 

In fact, as will be shown shortly, 91*’'™°(r, d) will tend to even disfavor per¬ 
turbations along 7r/4 (or 7r/4 I- n7r/2, n = 1,... ,3 in general) until it shifts 
gears and takes on its concave behavior with respect to 0. For instance, the 
curve of the normalized function 91‘''“"°(0)|r=o.2d„i„ appearing in Fig. |^is ac¬ 
tually convex (therefore assigning slightly higher reliability to perturbations in 
the direction of the next nearest neighbors having the same magnitude 0.2dnun), 
whereas the curves of the normalized function 91*™“°(6*)|r.=o.3d„i„ and beyond 


^The reliability on other quadrants is obtained by a basic reflection of g{9) about the 
vertical and horizontal axes, or more simply by mapping 6 G [0, 27r] back to 0 G [0,7r/2]. 
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are concave. To pinpoint the location of this behavioral shift, we need to find 


f = < r G 




ho ■ 1 

[U, 2 ^minj 


= (r,0) 


502 


= 0 


(19) 


where is expressed as 

5291 ‘runc^^^ 1^2 (/31na(cos0 — sin0)a®‘”®+“®® — lnasin0a“®® + lnacos0a®™®)^ 

502 ~ / 3 V (^Q,sin 0 +cose _|_ Q,sine _|_ Q,cose)3 

- (/31n^ a(cos 0 - sin 0 ) 2 a‘*“+ /31na(-sin0 - cos 0 )a'^“®+“'^® 

— In a sin 0 q;'*™ ® — In a cos 0 q;'^°® ® + In^ a cos^ 0 a®‘” ® 


sin 0+cos 9 


+ In^ a sin^ 6a^°^ / ((3d' 


Clearly, this is a daunting task. However, we can reduce it to finding the root, f. 


which satishes (19) when evaluated at 0 = 7 r/ 4 , since our main concern is whether 


or not (r, 0) will be tapered along 0 = 7 r/ 4 , and this will fortunately result 


reduces (19) to solving 


in many cancelations in (20) due to symmetry about this point. Pursuing this 


V2{Pa + 1 ) — In a = 0 . 


( 21 ) 


Expanding this into the original parameters implies that we need to find f such 
that 


>/2dn 


✓ 2 d„ 


-1 = 0 


( 22 ) 


'■D 


which cannot be solved explicitly in terms of f. Rather, by means of a proper 
substitution, it can be put in the implicit form =q, where q is indepen¬ 

dent of f and expressed using the primary branch Wq of Lambert’s W-function 
El- The explicit solution to the previous form can be expressed as g{f) = yVo{q), 
and the desired explicit expression f is obtained by back-substitution (Refer to 
Appendix B for details). Ultimately, we can show that 


-V2c 


•D 


2dn 


Wo -e 


1 -- 


- 1 


(23) 


Furthermore, as Wo(0) = 0 and dmin/S > cr|,, it is clear that > 2dniin, 

and that the argument of Wo quickly approaches zero from the left as cr^ di¬ 
minishes, resulting in the following accurate approximation of (23): 




'V 


2 dr. 


(24) 


for small relative to dniin/2. Fig. plots (23) and its approximation (24) 


( 20 ) 
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as functions of a^. Using the approximation for simplicity, f then splits the 
behavior of into two regions, supported by the intervals ri « 

{r e ^2 « {r e ^'^min]}- This result explains why 

0) in Fig.|^hrst resembles a circular shape akin to 1H° and then inflates 
along the diagonals deforming into a square-shaped reliability region, as 
can be seen when r = 0.3(i„iin in Fig. (see the black curve with square-shaped 
markers). Subsequently, as the magnitude, r, of the perturbation increases and 
approaches the decision boundaries, (r, 0) inflates outwards, resembling 

pointy leaves that can be modeled as /r-f (1 —cos(40-|-7r), where /rS [1/2,1] > 
1 — /i>0. The analysis also provides restrictions for when square-like reliability 
regions suggested in the literature (such as [TS]) can be justifiable. 



_J _I_1_^_ I_ 

0.1 0.2 0.3 0.4 0.5 0.6 x 


d,nin 

2 


Figure 5: Comparison of r in (231 and its approx, 
expressed as a ratio of dmin/2 (at dmin = !)■ 


in (24) as a function a 


2 

•D 


4-3. Dual-stage construction of rim 

The reader will notice that our primary objective so far in selecting flm was 
based on minimizing the probability of incorrect measurements, i.e., 

rim = argmaxPr C • (25) 

This is no doubt a necessary choice to preserve the success of the recovery 
algorithm as a whole, although we know that a more generic criterion, that is, 
one that is not at risk of using incorrect observations, would seek the tones with 
the maximum CNF0 i.e., (see @) 


®In other words, in a generic CS algorithm in which all measurements are 100% reliable, 
the most effective measurements are the ones which maximize the CNR. 


15 























Obviously, there is a conflicting interest between (251 and (261 , as the former 


frequently seeks smaller perturbations (^Y), since they are the most likely to 
equal T>, while the latter seeks the largest perturbations to maximize the CNR 
for enhanced estimation performance. 

This prompts us to consider a second recovery stage (i.e., another CS iter¬ 


ation) that uses (26) and produces a new subset of selected tones, denoted by 
The second recovery stage takes in obtained from (25) and uses it 


with any of the mentioned CS recovery algorithms to get a CS estimate of C, 
denoted by This lets us achieve the corrected decoding decision 
allowing us to have a higher confidence that T> — ^ — compared to 

the primary assumption that T) = ^ — {^). This is because the decoding error 
in (X) = {X + 1)) = (A’ -I- C -I- A“^Z) depends on the value of C, whereas the 
error in = (A’ -|- A) = (A” -|- -|- X~^Z) depends on the estimation 

error = C — C^® of C which is expected to be smaller than C itself. These 
results follow from (|^ and 

As illustrated in Figs. and [7j it is possible now to use these carriers that 
have the largest values of the perturbations, A— (A —C^®) (or even the carriers 
with the largest values of C^®), as the new CS measurements, without worrying 
much about how close A is to the decision boundaries. Note, however, that we 
never have access to C, or £^®, and therefore we can rely only on observable 
variables, such as V and A, to practically maximize the CNR. More importantly, 
these variables themselves are not always obtainable, since it is not necessarily 
the case that I? = A — (A) — C^®) or that A = A — C^® — (A — C^®) (which 
is the main reason that we repeat CS over a subset of measurements). Instead, 
we have to rely on the observable variables, A) — (A) — C^®) and C^®, for this 
task. The availability of these two observable variables grants us the flexibility 
of computing CNR in two different ways, both of which are suitable for different 
scenarios. The first is suited for a high SNR and low CS estimation quality (see 
Fig.|§, while the second is suited for a low SNR and high CS estimation quality 
(see Fig. [^. Dropping the coefficient index, k, these are: 

1. High SNR, large £■“: 


CNR = 


\X-{X-C 


CS\|2 


A-Ccs _ (^-CCS)|2 
iff (X - C^®) = A” 


\vl 

|A|2 

\C + X-^Z\^ 

Ifcs y-i2:|2 A-? 


(27) 


|C|^ 


|£:CS|5 


where the symbol x means that the expression on the left hand side of 
this symbol is asymptotically equal to the expression on the right hand 
side. 


16 











Figure 6: Computing CNR for the case of high SNR and low CS estimation 
quality 


2. Low SNR, small £■“: 


CNR 


\C 


CS|2 


|^_CCS _ (^_CCS)|2 
|CCS |2 


|C + £C:S|2 
lA-iZ + fCsp |A-iZ|2- 


|C|^ 


(28) 


Although the second ratio CNR more vividly resembles the CNR defined 


in (26), the first ratio is more effective in this work as the inherent complexity of 
CS based methods justifies itself in severe clipping scenarios and hence expect¬ 
edly higher CS error (i.e., large Consequently, we select the differential 

measurements corresponding to the maximum ratio^ 


L!CS = arg<iCNR 




N 


2=iV-|ncs|+i 


(29) 


^Obviously, the number of tones 1^2^^ | need not be equivalent to the original number l^ml, 
and a wealth of possibilities emerges in relating these two parameters for optimal performance. 
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Figure 7: Computing CNR for the case of low SNR and high CS estimation 
quality 


and the new CS model is 

y^cs = Sncs{X-{X-C^^))+Z'^as 

= ^HCsC + Sqcs, (30) 

which produces an improved estimate, This new estimate of the clipping 

distortion from a different subset of reliable and stronger measurements can 
then be subtracted from y, and a revised set of decoding decisions, — 
can be made. 

4.4- On selecting the cardinality 10^1 

In the method we proposed, it is assumed throughout the above discussion 
that the observations supplied to the CS algorithm were true measurements of 
the actual perturbations caused by clipping and additive noise. Although this is 
a mild restriction in practice, a guarantee must nonetheless be established that 
any selected flm and jUml according to Section will not result in CS failure. 
This requires that the number of measurements Iflml be both small enough 
to minimize the probability of incorrect measurements, and also large enough 
relative to the sparsity level of c to ensure meeting recovery bounds of CS. 

To this end, we derive a simple lower bound on the probability of 12^ C 
by deriving a lower bound on the probability that T) is indeed equal to A — (A). 
We study this for two cases: when A — (A) is observed within a disk of radius To 
from (A), i.e., = |A(fc)—(A(A:))| <ro}, and when it is observed within 

a square of side length 2ro centered at the QAM symbols A.. We focus on the 
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Figure 8: Illustrating how the integral over the shaded region upper-bounds the 
integral over S^. 


first case since it is more difficult to estimate, and directly give the result of the 
second since it is comparably straightforward. To this end, define a safe region. 
Si = {Ai +U G C : \U\ < To}, for decoding T(A:) within its square ML decision 
region, Qi, and denote the collection of all these safe regions by § = 

Our objective is to select Iflml such that Pr C nT||Om|) is high given a 
minimum amount of required measurement for CS success. 

Dropping the tone index, this will require finding Pr(D G S), which requires 
evaluating an integral of /p over (non-centered) discrete disks in the complex 
plane (see Fig. |^. Since this is difficult, we will use an upper bound based 
on evaluating /p over centered disks that cover these regions and then slice 
out the irrelevant regions. More specifically, the integral /^g§. over the disk 

Si of a nearest neighbor will be bounded by the integral over the area high¬ 
lighted by the shaded region in Fig.[^ This region covers the difference between 
the outer and inner sectors defined by radii dmin + and dmin — Tq, respec¬ 
tively, and a common angle, 9s = 2sin“^ ( In effect, the area of Si is 

Y ^min J 

strictly less than |j[7r(dmin + — 7r(di„in — »"o)^] and therefore Pr(I? G Si) < 
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(-FD((iniin + ?’o) — ^x>(dmin — »’o)) for the nearest neighbor. Consequently, 




Pr(ri„Cf^T||^^m|) = Pr Pi D.m{i) G 


i=l 


Pr (^(^) = X\\X-{^\<r^''^"''' = Vy{V = X- {X)\V G 
/Pr(|I?| <ro,I?GS)V^™' /Pr(|I?| <ro)V^"'' 


V Pr(X> G S) ) \ Pr(I? e S) 

> Fj^r'(ro)/[F|-p|(ro) + - sin“^-p- [F|-p|(dmin+ro)-F|-p|(dinin-ro)]) 

\ TT Wmin ^ 


(31) 

|n„| 


-(d^ - +r 


' dmin ^ Cr|, / / 


32) 


In the case where the average distortion is large and square regions S° = 
S° of side-length 2ro are used, pursuing the same logic above we just 


replace the ratio in (31) with 


Pr(I?G S°) 
Pr(X> e S°) 


i- 2 Q(^) 


1 -2Q(;^) 


-f4 


Qi 


dmin-Tc 


)-Qi 


*^min~f~'P o 


(33) 


and raise it to the power 117^1 to obtain Pr(r2mC Ht] |f7m|), where Q is the famil¬ 
iar tail probability function. The user must then choose = argmax|n^| [Pr(f7m C 
^t \ |f7m|) > t] where r is selected so as to supply as much information to the CS 
algorithm as possible while remaining in a safe region of correct measurements. 
Furthermore, given a clipping threshold, 7 , we have an expected sparsity level, 
E[|Ic|], and variance a'^j- |, which need to be taken into account when using 
sparse recovery techniques [S]. We will denote this minimum required number 
of frequency observations to recover an |Ic|-sparse vector in time by to 
stress its strong dependence on 7 , and take 117^1 = max(|f7j(j|, |f72il)- The same 
can be done with the optional second stage |f7^®|. 

Suppose however, after taking all the protective measures thus far, that an 
incorrect measurement was nonetheless supplied to the CS algorithm. Does this 
result in CS failure? Luckily, in this application the answer is no. Recall first 
the decoding-error vector S = X — (^) used to motivate (|^ . When a decoding 
error is made at the fcth coefficient, an incorrect measurement T(fc) — (<^A;)) = 

X{k) +'D{k) — (fT(fc)) — 27(A:) +£{k) is supplied and it follows from that 
the incorrect measurement has no impact on the performance of CS. Note that 
the nonzero entries of £ are quantized and bounded since X and (^) G . 
Furthermore, assuming most equalized coefficients Xik) are decoded correctly, 

£ is also sparse. 

The general differential model (i.e., the one that is not confined to the carriers 
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in rirn) becomes 


k-ik) 


C + A-^Z + £ 


(34) 


The first term added to C is the (dense) Gaussian noise vector, while the second 


is a structured-noise term. Equation (34) matches the model in [16], and it 
is demonstrated therein how it is possible to recover C from such noise via 
variations in the CS algorithm. The main results of the paper are summarized 
in the algorithm listed in Table 


input: y, A, 
output: X = (X — 

1. Compute X, {X), \X- (^)|, dx-(x)’ l^ml, \^7n 

2. Case: Severe clipping, for k = 1,..., N 


9\{k) (d(a“"® + ; a = e , P = e 

3. Case: Mild clipping, for k = 1,..., N, 

if \X{k) - {X_{k))\ < IH(fc) ^ UikMik) - {X{k))) 

else iR{k)^ /i>{fc)(T(A:) - (A(A:))) • + (1 - fi) cos(46i^(^)_(^(^)^ 

4. = max(|ni;,|,in;^!) 

5. Use any sparse recovery method (e.g. WPAL or FBMP m 

v / ■v\ o „ v — iv 


+ vr) 


on ^ — {X) over flm and decode, i.e., X = (^ — C^°) 
6 . (Optional): 

a CNR'^®'' — )\ 

\X-C^^-{X-C<^^)P 

b. Select using CNR’'®'' 

c. Perform CS on ^ — (^ — over 


Table 1: The proposed method 


5. Simulation Results 

In this section, we perform several different experiments to show the effec¬ 
tiveness of the proposed technique. The methods proposed in this paper were 
tested on an 256 subcarrier OFDM signal, modulated using 64-QAM. The sig¬ 
nal was subject to a block-fading, frequency-selective, Rayleigh channel model, 
subject to varying noise and clipping ratios (CR) defined as CR= y/tTx [TT] . 
Here, ax is the standard deviation of the OFDM signal. Available packages 
for convex programming [TS] and Fast Bayesian Matching Pursuit (FBMP) [TT] 
were used to implement the sparse-recovery algorithms. The undistorted phase 
property (as utilized in i) is utilized while implementing FBMP and hence 
the modified version is termed phase augmented-FBMP (PAFBMP). Lastly, we 
refer to the second stage defined in |4.3| as a corrective action on the first estima¬ 
tion operation, and label its output by C-WPAL or C-PAFBMP for the cases 
when WPAL |6| and PAFBMP HH are used, respectively. 
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Figure 9: NSR vs. CR for the various reliability functions defined in Section 
O Et/No = 20dB, \nm\ = 64, fn = 0.65, fi2 = 0.95. 
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Figure 10: Achievable Rate as a function of the clipping ratio CR= j/ax 


5.1. Comparison of Reliability Criterias 

In the first experiment, the reliability criterias proposed in this work are com¬ 
pared, including gcjtrunc^ ^ performance metric, we use 

normalized success rate (NSR) defined as \{k : {X{k)) = X{k),k S Rm}|/|Rm|- 
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CR 

Figure 11: Relative run time (max. = 100%) at different clipping levels. 
WPAFBMP = 0.25iV, mc_pAFBMP = 0.39iV, mwPAL = 0.25A^, wc-wpal = 
0.39iV, R = 5. 


The NSR depicts that among the |Om| tones favoured by a particular reliability 
criteria, how many were actually within their corresponding correct decision 
regions. Fig. shows the result of this experiment. The results were plotted 
against a varying CR while 64 most reliable carriers were sought keeping Eb /Nq 
fixed at 20dB. It is expected that as CR is increased, all reliability criterias will 
tend to improve. The simulation results conhrm this intuition and also conhrm 
the conjecture that the truncated reliability computation comes at little cost 
compared to using the exact reliability function (14). Furthermore, it is shown 
for the parameters used for penalizing the circular reliability function, 94°, by 
/r 2 = 0.95 (which results in a square-like function such as the curve plotted in 
Fig. [^for r — O.Sdmin) was more effective than with a smaller value of fii = 0.65, 
for milder CR levels. On the other hand, with severe CR, smaller values of /x 
were better. This observation is highlighted by showing enlarged version of the 
graph for severe and milder CRs. 


5.2. Achievable Rate 

In this experiment, the ultimate performance measure considered was the 
achievable rate with and without the proposed method. Assuming ergodicity 
over the subcarriers, this rate can be expressed as ^ log 2(1 + 

for the un-mitigated case, and as log 2(1 -I- |AipCT^/(|Aip)(T^_^- + 

for the case when an estimate C of C is obtained by an arbitrary method [2]- 
As a benchmark Oracle-LS is utilized, where the support of clipping signal is 
perfectly known and the active elements are estimated using LS estimate at 
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the receiver. By using the WPAL (11)[S] and PAFBMP m sparse recovery 
algorithms over the exact reliability function we compared the clipping 

mitigation results. The results in Fig. show the superior performance of 
C-PAFBMP m using the most reliable 39% of carriers and the ability of C- 
WPAL to beat all the techniques at CRs around 1.5 using the most reliable 39% 
carriers. 


5.3. Complexity 

A practical comparison of relative run times for the tested algorithms is 
reported in Fig. 11 All times are scaled and represented as percentage of 
the maximum time required for recovery. We observe that PAFBMP has the 
least complexity among the compared schemes and this complexity reduces with 
increased CR. 


6. Conclusion 

A novel method for nonlinear distortion mitigation using pilotless sparse 
recovery techniques was proposed. The method exploits the sparsity of the 
distortion in time domain to fully recover the signal without being influenced 
by incorrect ML-decoding decisions. The method adaptively senses over reli¬ 
able subsets of observations of the distortion in frequency domain to perform 
the recovery. A new method of computing the reliability of each observation 
independently of the other M — 1 candidates within a constellation was also pro¬ 
posed and tested. Through simulations, it is verified that the proposed scheme 
provides favourable results for clipping signal recovery and achieves a rate close 
to Oracle-LS based recovery. 

Appendix A. Deriving fx> 

Beginning with the fact that 

cr|, = cr^-h A”^A"V|, (A.l) 

can be made, and we subsequently work out the energy of the sparse vector, c, 
which is a compound random variable. By total expectation we have 

E[||c||2] =E|xq [E[||c||2||I,|]] =E|x,| [|Ic|E[|cn] =E[|Ie|]E[|cp] (A.2) 

where we have dropped the index in E [|c p] to denote an arbitrary nonzero coef¬ 
ficient of c. Using this result, we can see that = E[||c|| 2 ]/A^ = E[|cp]E[|Ic|]/A/ 
by Parseval’s energy conservation law and an ergodicity assumption. Moreover, 

E[|c|^] = E [|a;|^||a;| > 7] — 27E [|a;|||x| > 7] + 7^ (A.3) 
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and so we derive the pdf of |a;| given |a;| > 7 to be 






'ril 


where F| 2 ,|( 7 ) = . Computing the terms in (A.3) using (A.4) we get 


E[|c| ] = 2ct|^| -2^/-7a|^|e' 


1 — erf 


7 

y2cr|x|_ 


(A.5) 


Lastly, since |Ic| is a binomial, E[|Z e|] = AfF| 3 ;|( 7 ) = . Substi tutin g 

these last two expressions into (A.2 1 gives cr^, which in turn produces (A.l), 
and this parameter characterizes /p. 


Appendix B. Tailoring the Lambert W-Punction 


Given the equation y = xe^, which for reasons stated below we call the 
canonical form, it is desired to solve explicitly for x. Unfortunately, this could 
not be done using elementary operations. Instead, the solution can be expressed 
in terms of the Lambert W-Function, where x = yV{y) is the solution to the 
canonical form, and where we can thus equivalently write y = yV{y)e^^'^\ 

The function is generally multivalued. If we restrict its argument, y, to be 
real, then it produces two outputs for each point on the supporting interval 
y G [—e“^, 0], which is our interval of interest. However, one of the two outputs 
of yV{y) is > —I, and is referred to as the primary branch, Wo{y), while the 
second is < —1, and is referred to as the secondary branch, >V_i(y). It will soon 
be clear that only the primary branch is needed (hence an injective mapping 
between y and x is retained). In any case, the function can be found iteratively 
by Newton’s method; for instance, Xj+i = Xj — The problem at hand, 

as expressed in (22), is more complex than the canonical form. Nonetheless, it 


could be reduced to this form by a clever substitution M- First express ( [22| 
compactly as 


gaf+h = cr + d. 


(B.l) 


where a = y/2dminlcr%, b = c = y ^dm in/o'n = a, and d = -I. 

Letting p = —air + and substituting into (B.l) gives — 76 “^+^ = pe^. 
Comparing with the canonical form, the solution to the equation can be ex¬ 
pressed as p = Wo(— fBack-substitution to the four parameters in 

(B.l) returns f = —yWo (— 76 “^+^^ — Returning the values of a,b,c and 
d into this equation gives the final expression in (23). 
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